Peptide Folding Kinetics from Replica Exchange Molecular Dynamics 
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We show how accurate kinetic information, such as the rates of protein folding and unfolding, can 
be extracted from replica-exchange molecular dynamics (REMD) simulations. From the brief and 
continuous trajectory segments between replica exchanges, we estimate short-time propagators in 
conformation space and use them to construct a master equation. For a helical peptide in explicit 
water, we determine the rates of transitions both locally between microscopic conformational states 
and globally for folding and unfolding. We show that accurate rates in the ~1/(100 ns) to ~1/(1 ns) 
range can be obtained from REMD with exchange times of 5 ps, in excellent agreement with results 
from long equilibrium molecular dynamics. 



Replica exchange molecular dynamics (REMD) [J Q 
is a powerful method to enhance the conformational sam- 
pling, addressing a serious challenge in molecular simula- 
tions [3(. Multiple non-interacting copies (or "replicas") 
of the system are simulated in parallel, each at a different 
temperature. To transfer the barrier-crossing efficiency 
from runs at high temperature to those at low tempera- 
ture, configuration exchanges are attempted periodically 
(e.g., at time intervals #£remd) between replicas at differ- 
ent temperatures (Ti and Tj). Those exchange attempts 
are accepted with a Metropolis probability -Premd(* *-* 
j) = min{l, exp[(/3 3 — (ii){Uj — Ui)]} that enforces de- 
tailed balance and maintains canonical distributions at 
each temperature [with Ui the potential energy of the 
i-th replica, /3, — l/(fcsTi), and ks the Boltzmann con- 
stant]. After an accepted exchange, the particle velocities 
are appropriately re-scaled to the new temperature, or re- 
drawn from respective Maxwell-Boltzmann distributions. 
Through a series of exchanges, high-temperature confor- 
mations are transferred occasionally to low temperature 
runs, facilitating the exploration of new configuration- 
space regions. 

While enhancing the exploration of conformation 
space, REMD apparently does not permit the extraction 
of useful kinetic information. Conformation exchanges 
result in discontinuous trajectories, precluding the calcu- 
lation of equilibrium time correlation functions for times 
longer than the exchange time <5£remd- To improve 
the sampling efficiency of REMD, the shortest possible 
(JtREMD should be used 0|. With <5£remd much shorter 
than the time scales of slow conformational changes, the 
rates of conformational changes appear inaccessible to 
REMD simulations. To overcome this problem, at least 
for the special case of a two-state system, an indirect 
method has recently been proposed in which the two rate 
coefficients describing the assumed folding/unfolding dy- 
namics are assumed to obey an Arrhenius temperature 
dependence However, the protein- folding rate often 
exhibits non- Arrhenius temperature dependence [(|, and 
folding intermediates are common. To avoid the result- 
ing problems, master-equation approaches have been de- 
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scribed by Levy and co-workers [7( in a qualitative, yet 



insightful analysis. As a quantitative alternative, REMD 
has recently been used to estimate the local drift and 
diffusion coefficient s [a| w ithin the framework of coarse 
diffusion equations [9|, [lfj, . 

Here we show how one can efficiently extract accurate 
transition rates from REMD simulations, both locally be- 
tween microscopic conformational states and globally be- 
tween folded and unfolded conformations (and possible 
intermediates), without the assumption of a certain tem- 
perature dependence of the underlying kinetics. In fact, 
our method can be used to investigate the Arrhenius or 
non- Arrhenius character of a particular system. We de- 
termine short-time propagators in conformation space to 
overcome the problems arising from the intrinsically dis- 
continuous character of REMD trajectories [liL [Til]. 

We first realize that REMD permits the accurate (and 
formally exact) calculation of short-time correlation func- 
tions. The initial configurations after a replica exchange 
(with appropriate velocity assignment) constitute valid 
representatives of the equilibrium phase-space distribu- 
tions at the respective temperatures. From the subse- 
quent Hamiltonian dynamics until the next exchange, we 
can obtain exact correlation functions. The maximum 
time scale will be a few <5^remd , given by the longest 
time between accepted replica exchanges. 

Specifically, we here determine the frequency of transi- 
tions between conformational states. From the observed 
molecular transitions, we construct a master equation de- 
scribing the dynamics in a conformation space divided 
into N distinct states. We later verify that the dynamics 
in the resulting projected space is captured by a master 
equation, dPi(t)/dt = J2f=i ~ kjiPi(t)], where 

Pi(t) is the population in state i, and kij > is the tran- 
sition rate from j to i ^ j. In vector-matrix notation, we 
have dP(t)/dt = KP(t), where the N x N rate matrix 
K has off-diagonal elements fey and diagonal elements 
ku — — YljM kji < 0. The propagators, defined as the 
probability of being in state j at time t given that the 
system was in state i at time 0, can be written in terms of 
the matrix exponential, p(j, t\i,0) — [exp^Kt)]^. To es- 
timate the elements of the rate matrix K from either long 
equilibrium simulations or REMD, we use a maximum- 



likelihood procedure. We first determine the number Nji 
of transitions from state i to state j within a time in- 
terval At, irrespective of intermediate states. The log 
likelihood of observing transition numbers Nji 
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To obtain the rate coefficients of the master equation 
(with upper and lower diagonal elements related by de- 
tailed balance), we maximize m£ with respect to the hij 

Bill. 

Effects of non-Markovian dynamics not captured by 
the master equation result in a dependence of the rate 
matrix on the time interval At. Ultimately, for long lag 
times At, fast non-Markovian dynamics is effectively sup- 
pressed and the propagators are dominated by the slow 
transitions [ll|, [12, Il3| . However, if At is short, fast 
motions lead to improper assignments of conformational 
states. As a consequence, the extracted rate matrices 
tend to predict overly fast conformational relaxation. 

The problem of fast non-Markovian dynamics can be 
suppressed by assigning the states with the help of tran- 
sition paths that connect well-defined regions within two 
conformational cells (Fig. [T^-b). A new state is assigned 
only if the trajectory crosses from one well-defined region 
to another. Fast equilibrium fluctuations in the projected 
space thus do not lead to a state change. We showed pre- 
viously that for peptide folding in long standard molecu- 
lar dynamics (MD) simulations, this procedure gives ac- 
curate rate matrices for observation times At as short as 
1 ps [H. 

Here, we adapt this state-assignment procedure to 
REMD. In a first step, we follow each replica irrespec- 
tive of exchanges, and identify transition paths for these 
continuous trajectories to assign states. In a second step, 
transition numbers Nji for each of the REMD temper- 
atures are determined from the respective short trajec- 
tory segments uninterrupted by replica exchange. From 
the Nji, we then estimate the coefficients of the master 
equation through likelihood maximization. 

In the following, we demonstrate the general procedure 
to calculate slow rates from REMD with fast exchange. 
Master-equation approaches have been used extensi vely 
in peptide folding studies @, [H, 14, 
We used the GROMACS 3.3 package 



15 



17, h 



19( to run both 



standard MD and REMD simulations for the folding 
of a short helical peptide, blocked Alas (i.e., CH3CO- 
Ala5-NHCH3), in explicit water [U ED]. We used the 
AMBER-GSS force field ;22| ported to GROMACS 
with peptide (<&, 4") torsional potentials modified to re- 
produce experimental helix-coil equilibria 0] • Simulation 
details can be found in Ref . |l3 . 

Four independent MD and REMD runs were initiated 
from different configurations (11111 - 'all helix', 00000 - 
'all coil', 01010, and 10101, where 1 denotes a residue in 
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FIG. 1: REMD simulations, (a) Schematic of transition-path 
based assignment of conformational states, shown for illustra- 
tive purposes in ID (with the actual assignment done using 
both $ and $ [HI])- The backbone dihedral angle ^ of ala- 
nine exhibits transitions between helical (blue, \I/ < 0) and 
non-helical states (green, $ > 0). Conformations within nar- 
row regions around the two free energy minima (gray) can be 
assigned as helical or coil with high confidence. For other con- 
formations (black dots), the assigned state changes only if the 
trajectory crosses between the well-defined regions, but not 
on equilibrium excursions that revert without actual crossing, 
(b) State assignment corresponding to (a), (c) Temperatures 
sampled by a typical Ala5 replica during a 150 ns REMD 
simulation, (d) Conformational states sampled by the Ala5 
replica during the same run. 



the helical region of the Ramachandran map, ordered left 
to right from N to C terminus [Hj]). The reference MD 
runs covered 250 ns at two different temperatures (300 
and 350 K), for a total combined simulation time of 2 fj,s. 
The 150-ns REMD simulations used 12 replicas spanning 
the 295-350 K temperature range for a total combined 
simulation time of 600 ns per replica. Coordinates were 
saved every 1 ps and REMD exchanges were attempted 
every i^remd = 5 ps. Figure Hh shows that the result- 
ing REMD trajectories pass through the whole range of 
temperatures multiple times. Each individual trajectory 
also has a high likelihood to visit most, if not all, of the 
32 coarse-grained conformational states (Fig. QJl; with 
00000 and 11111 corresponding to states 1 and 32 in bi- 
nary notation plus 1). In the resulting master equation 
model, the transition rates kij arc different from zero 
only if states i and j in binary notation differ by at most 
one bit, producing the connectivity of a five-dimensional 
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FIG. 2: REMD equilibrium populations P eq at 300 and 350 
K. Shading indicates the folded basin. (Inset) Scatter plot 
of Pe q from standard MD and REMD at 300 K. Error bars 
indicate standard deviations of the mean. 




FIG. 3: Relaxation times T2 (circles, blue) and T3 (diamonds, 
green) as a function of temperature. Open squares show T2 
and T3 from standard MD at 300 and 350 K (Inset) 
Folding (kp) and unfolding (ku) rate constants as a function 
of 1/T. 



hypercube. 

Figure [2] shows the equilibrium populations in each 
of the 32 conformational states at 300 and 350 K from 
REMD trajectories. The inset illustrates the excellent 
agreement between equilibrium distributions from MD 
and REMD at 300 K. At 350 K, the sampling is more ef- 
ficient and the agreement even better (data not shown). 

Figure [3] demonstrates that the master equation accu- 
rately captures the kinetics. Shown are the two slowest 
relaxation times, ti and T3, at the 12 temperatures sam- 
pled in the REMD runs (where Ti = — 1/ Aj, with Aj the 
ordered eigenvalues of K). The REMD relaxation times 
agree perfectly with those obtained from standard MD 
runs at 300 and 350 K [l3[ . This agreement holds also for 
all relaxation times Ti (not shown for i > 4), and the in- 
dividual coefficients kij of the master equation, as shown 
in Fig. (with linear correlation coefficients > 0.94). 

From the slowest relaxation time T2, and the relative 
populations in the folded (helical) and unfolded (coil) 
state of the peptide, we estimate folding and unfolding 
rates as a function of temperature under the assumption 
of a two-state relaxation (Fig. [3] inset). The 32 states 
i were assigned as folded or unfolded based on the left- 
hand eigenvector of K corresponding to eigenvalue A2 
13, 24] (see Fig.©. The result ing folded basin consists of 
all structures with at least one a-helical (i,i+4) backbone 
hydrogen bond among the four N-terminal residues. Con- 
sistent with the assumptions of Ref. we find that the 
resulting folding and unfolding rates exhibit Arrhenius- 
like dependence on temperature. The activation ener- 
gies for folding and unfolding are E% w 22.1 kJ/mol and 
E% w 46.5 kJ/mol. 

A possible concern is the influence of fast non- 
Markovian dynamics not taken into account by the mas- 



ter equation model. We can explicitly probe for such 
effects by plotting the calculated relaxation times Ti as a 
function of the lag time At used to determine the prop- 
agators. Figure H shows that the relaxation times from 
REMD are independent of At from 1 to 10 ps (2<$£remd), 
and agree with the results from standard MD. 

We showed how accurate rates for the conformational 
dynamics of a molecular system can be extracted from 
REMD simulations. For a short helical peptide in water, 
the REMD kinetics was in perfect agreement with that 
from standard MD. The key elements of the procedure 
are (1) the suppression of non-Mar kovian noise by using 
transition paths in the assignment of states, (2) the cal- 
culation of transition numbers iVy on the time scale of 
replica exchanges, and (3) the construction of a master 
equation from the N t j using a maximum likelihood pro- 
cedure. The formalism is general, and can be adapted to 
Hamiltonian REMD 12511 . resolution exchange [26j , non- 
Boltzmann reservoirs [2l\. serial replica exchange [jjj ]. 
etc. 

In practical applications, such as protein folding, the 
combinatorial explosion in the number of states poses 
a major challenge for large systems. To reduce the 
dimension of the master equation, states could be de- 
fined by using conformational clustering [2§|, subsets of 
the dihedral-angle coordinates (that produce the most 
Mar kovian dynamics), or alternative coordinates such as 
native or non-native amino-acid contacts or contact frac- 
tions, the radius of gyration, or distances between key 
residues, with our formalism applicable to both discrete 
and continuous variables [T3|. In addition, hierarchical 
coarse graining [3fj| can be used to combine fine and 
coarse-grained master equations 0, [l2| ■ As a second chal- 
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FIG. 4: Validation of transition rates estimated from REMD 
trajectories, (a) Rates ktj from REMD versus those from 
standard MD [13] at 300 K (blue) and 350 K (red), (b) De- 
pendence of the relaxation times T2 (top, red), T3 (middle, 
green), and T4 (bottom, blue) on the lag time At at 300 K. 
REMD results are shown as symbols connected by solid lines. 
Reference values from standard MD are shown as dashed lines 
with error bars. Results for At > <5tREMD were obtained from 
continuous trajectory segments in which replica exchange at- 
tempts had been rejected. 



lenge, the need to collect sufficient transitions at all tem- 
peratures to construct a connected master equation could 
be overcome by assuming that the individual rates kij, 
but not necessarily the slow relaxations r, , satisfy an Ar- 
rhcnius law. In that way, transitions observed at higher 
temperatures can be used to estimate the relaxation time 
scales at lower temperatures, augmented by the accu- 
rate equilibrium populations of REMD through the re- 
quirement of detailed balance. Such a procedure is easily 
implemented within our likelihood-maximization frame- 
work by replacing the individual rates with temperature- 
independent prefactors and activation energies. 
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